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Abstract 

In  this  chapter  we  explain  variance  reduction  techniques  from  the  Hilbert  space 
standpoint,  in  the  terminating  simulation  context.  We  use  projection  ideas  to  ex¬ 
plain  how  variance  is  reduced,  and  to  link  different  variance  reduction  techniques. 
Our  focus  is  on  the  methods  of  control  variates,  conditional  Monte  Carlo,  weighted 
Monte  Carlo,  stratification,  and  Latin  hypercube  sampling. 


1  Introduction 


The  goal  of  this  chapter  is  to  describe  variance  reduction  techniques  from  a 
Hilbert  space  perspective  in  the  terminating  simulation  setting,  with  the  fo¬ 
cal  point  lying  on  the  method  of  control  variates.  Several  variance  reduction 
techniques  have  an  intuitive  geometric  interpretation  in  the  Hilbert  space  set¬ 
ting,  and  it  is  often  possible  to  obtain  rather  deep  probabilistic  results  with 
relatively  little  effort  by  framing  the  relevant  mathematical  objects  in  an  ap¬ 
propriate  Hilbert  space.  The  procedure  employed  to  reach  most  results  in  this 
context  consists  of  three  stages: 

(1)  Find  a  pertinent  space  endowed  with  an  inner  product. 

(2)  Apply  Hilbert  space  results  towards  a  desired  conclusion. 

(3)  Translate  the  conclusions  back  into  probabilistic  language. 

The  key  geometric  idea  used  in  Stage  2  is  that  of  “projection” :  Given  an 
element  y  in  a  Hilbert  space  H  and  a  subset  M  of  the  Hilbert  space,  it  holds 
under  mild  conditions  that  there  exists  a  unique  element  in  M  that  is  closest 
to  y.  The  characterization  of  this  element  depends  on  the  space  H  determined 
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by  Stage  1,  on  y,  and  on  M;  in  essence  it  is  found  by  dropping  a  perpendicular 
from  y  to  M. 

The  Hilbert  space  explanation  of  control  variates,  and  to  a  somewhat  lesser 
extent  that  of  conditional  Monte  Carlo,  is  closely  related  to  that  of  other 
variance  reduction  techniques;  in  this  chapter  we  bridge  these  connections 
whenever  they  arise.  From  the  projection  perspective,  it  is  often  possible  to 
link  variance  reduction  techniques  for  which  the  relevant  Hilbert  space  H  and 
element  y  G  H  are  the  same.  The  articulation  is  done  by  judiciously  choosing 
the  subset  M  for  each  particular  technique. 

We  do  not  attempt  to  provide  a  comprehensive  survey  of  control  variates  or 
of  the  other  techniques  covered  in  this  chapter.  As  to  control  variates,  several 
publications  furnish  a  broader  picture;  see,  for  example,  Lavenberg  and  Welch 
(1981),  Lavenberg  et  al.  (1982),  Wilson  (1984),  Rubinstein  and  Marcus  (1985), 
Venkatraman  and  Wilson  (1986),  Law  and  Kclton  (2000),  Nelson  (1990),  Loh 
(1995),  and  Glasserman  (2004).  For  additional  material  on  other  variance  re¬ 
duction  techniques  examined  here,  refer  to  the  items  in  the  References  section 
and  to  references  therein. 

This  chapter  is  organized  as  follows:  Section  2  is  an  overview  of  control  vari¬ 
ates.  In  Section  3  we  review  Hilbert  space  theory  and  present  several  examples 
that  serve  as  a  foundation  for  the  rest  of  the  chapter.  Section  4  is  about  control 
variates  in  Hilbert  space.  The  focus  of  Section  5  is  on  the  method  of  conditional 
Monte  Carlo,  and  on  combinations  of  conditional  Monte  Carlo  with  control 
variates.  Section  6  describes  how  control  variates  and  conditional  Monte  Carlo 
can  reduce  variance  cooperatively.  The  subject  of  Section  7  is  the  method  of 
weighted  Monte  Carlo.  In  Sections  8  and  9  we  describe  stratification  tech¬ 
niques  and  Latin  hypercube  sampling,  respectively.  The  last  section  presents 
an  application  of  the  techniques  we  investigate.  As  stated  above,  the  focus 
of  this  chapter  is  in  interpreting  and  connecting  various  variance  reduction 
techniques  in  a  Hilbert  space  framework. 


2  Problem  Formulation  and  Basic  Results 


We  study  efficiency  improvement  techniques  for  the  computation  of  an  un¬ 
known  scalar  parameter  a  that  can  be  represented  as  a  =  EY ,  where  Y  is 
a  random  variable  called  the  response  variable,  in  the  terminating  simula¬ 
tion  setting.  Given  n  independent  and  identically  distributed  (i.i.d.)  replicates 
Yi, . . . ,  Yn  of  Y  produced  by  the  simulation  experiment,  the  standard  estimator 
for  a  is  the  sample  mean 


Y 


1 

n 


£  u. 


fc=i 
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The  method  of  control  variates  (CVs)  arises  when  the  simnlationist  has  avail¬ 
able  a  random  column  vector  X  =  {X\, . . . , Xfi)  e  Md,  called  the  control,  such 
that  X  is  jointly  distributed  with  Y,  EX  =  /ix  is  known,  and  it  is  possible  to 
obtain  i.i.d.  replicates  (bj ,  Xx), . . . ,  (Yn,  Xn)  of  (Y,  X)  as  a  simulation  output. 
Under  these  conditions,  the  CV  estimator  is  defined  by 

bbv(A)  =  F-AT(X-/ix),  (1) 

where  A  =  (Ax, . . . ,  Ad)  €  is  the  vector  of  control  variates  coefficients;  -T 
denotes  transpose,  vectors  are  defined  as  columns,  and  vectors  and  matrices 
are  written  in  bold. 


The  following  holds  throughout  this  chapter: 


Assumption  1  E(Y2  +  J2i=i  X?)  <  oo  and  the  covariance  of  (Y,  X),  defined 
by 


X 
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is  non-singular. 


The  first  part  of  Assumption  1  is  satisfied  in  most  settings  of  practical  interest. 
Furthermore,  when  X  is  singular  it  is  often  possible  to  make  it  non-singular 
by  reducing  the  number  of  controls  X;  see  the  last  paragraph  of  Example  5. 

Naturally,  A  is  chosen  to  minimize  Var  Yqv  ( A) ,  which  is  the  same  as 

minimizing  a2  —  2A7  Xx2/  +  A7  Xxx  A.  (2) 

The  first  and  second-order  optimality  conditions  for  this  problem  imply  that 
there  exists  a  unique  optimal  solution  given  by 

A’  =  S^SXV.  (3) 


With  this  choice  of  A  =  A*  the  CV  estimator  variance  is 

Var  Yen  (A*)  =  Var  Y(1  -  R2yx),  (4) 


where 


v  v-1  v 

rp2  _ 

W/x  _  n2 

V 

is  the  square  of  the  multiple  correlation  coefficient  between  V  and  X.  CVs 
reduce  variance  because  0  <  Ryx  <  1  implies  Var  Ycv  <  Var  Y  in  (4).  The 
central  limit  theorem  (CLT)  for  Yqv  asserts  that,  under  Assumption  1, 


n1/2iXcv{^*)  -  a)  =>  N(0,  <JcV), 
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where  a^v  =  cr^(l  —  R^x),  =>•  denotes  convergence  in  distribution,  and  N( 0,  a2) 
is  a  zero-mean  Normal  random  variable  with  variance  a2. 

In  general,  however,  the  covariance  structure  of  the  random  vector  ( Y. ,  X)  may 
not  be  fully  known  prior  to  the  simulation.  This  difficulty  can  be  overcome  by 
using  the  available  samples  to  estimate  the  unknown  components  of  E,  which 
can  then  be  used  to  estimate  A*.  Let  A„  be  an  estimator  of  A*  and  suppose 
that  An  A*  as  n  — »  oo.  Then,  under  Assumption  1, 

n1/2(Ycv( An)  -a)=>  N{ 0,  a2cv),  (5) 

as  n  — >  oo ;  see  Glynn  and  Szechtman  (2002)  for  details.  Equation  (5)  means 
that  estimating  A*  causes  no  loss  of  efficiency  as  n  — >•  oo,  if  A„  A. 


Thus  far  we  have  only  considered  linear  control  variates  of  the  form  Y — AT(X— 
Hx).  In  some  applications,  however,  the  relationship  between  the  response  and 
the  CVs  is  non-linear,  examples  of  which  are:  Vexp(A7(X  —  /ix)),  YX/fix , 
and  Y^lx .  In  order  to  have  a  general  representation  of  CVs  we  introduce  a 
function  /  :  Md+1  — ■»  M  that  is  continuous  at  (y,Hx)  with  )  =  y.  This 

property  ensures  that  /(V,X)  — >  a  a.s.  if  (y,x)  ^  (a,  /j,x)  a.s.,  so  we  only 
consider  such  functions. 

The  limiting  behavior  of  f(Y,  X)  is  characterized  in  Glynn  and  Whitt  (1989, 
Theorem  9)  under  the  assumption  that  the  i.i.d.  sequence  {(Vn,  Xn)  :  n  >  1} 
satisfies  the  CLT  y/n((Y,  X)  —  ( a ,  /txx))  N( 0,  E),  and  that  /  is  continuously 
differentiable  in  a  neighborhood  of  (a,  /ix)  with  first  partial  derivatives  not  all 
zero  at  (a, /zx).  Then 


Vn(f(Y ,  X)v—  a)  =>  N(Q,  a2),  (6) 

as  n  — >  oo,  where  aj  is  given  by 

=  al  +  2Vx/(«,  VxV'Zxy  +  Vx/(a,  /ix)TS xxVx/(a,  /Ltx),  (7) 

and  Vx/(y,x)  G  is  the  vector  with  ith  component  df/dxi(y,  x). 

The  asymptotic  variance  is  minimized,  according  to  Equation  (2)  with 
Vx/(a,/ix)  in  lieu  of  A,  by  selecting  f*  such  that  Vx/*(a,^x)  =  — EXXEX?; 
in  (7);  that  is,  <7^*  =  cr|(l  —  i?^x).  In  other  words,  non-linear  CVs  have  at 
best  the  same  asymptotic  efficiency  as  YCv( A*).  Notice,  however,  that  for 
small  sample  sizes  it  could  happen  that  non-linear  CVs  achieve  more  (or  less) 
variance  reduction  than  linear  CVs. 

The  argument  commonly  used  to  prove  this  type  of  result  is  known  as  the 
Delta  method;  see  Chapter  2  or,  for  a  more  detailed  treatment,  Serffing  (1980, 
p.  122).  The  reason  why  y/n(f(Y,  X)  —  a)  converges  in  distribution  to  a  normal 
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random  variable  is  that  /  is  linear  in  a  neighborhood  of  (cp  px)  because  /  is 
differentiable  there,  and  a  linear  function  of  a  normal  random  variable  is  again 
normal. 


To  conclude  this  section,  for  simplicity  let  the  dimension  d  —  1  and  suppose 
that  only  an  approximation  of  fix ,  say  7  =  fix  +  e  for  some  scalar  e,  is  known. 
This  is  the  setting  of  biased  control  variates  (BCV).  The  BCV  estimator  is 
given  by 

>W(A)  =  F-A(X-7). 

The  bias  of  Ybcv(A)  is  Ae,  and  the  mean-squared  error  is 

E(YBcvW  ~  a)2  —  Vary  +  X2E(X  —  y)2  —  2A  Cov(F,  X). 
Mean-squared  error  is  minimized  by 

An  =  Cov(?,W)/£(W-y)2,  (8) 


and 


e(ybcv(K)  -  «)» =  v„r  (1  -  vjff/_)a7)a)  ■ 

which  is  (4)  when  e  =  0;  see  Schmeiser  et  al.  (2001)  for  more  details  on  BCVs. 


3  Hilbert  Spaces 


We  present  basic  ideas  about  Hilbert  spaces,  mainly  drawn  from  Kreyszig 
(1978),  Bollobas  (1990),  Zimmer  (1990),  Williams  (1991),  and  Billingsley  (1995) 
We  encourage  the  reader  to  consult  those  references  for  proofs,  and  for  addi¬ 
tional  material.  We  illustrate  the  concepts  with  a  series  of  examples  that  serve 
as  foundational  material  for  the  rest  of  the  chapter. 

An  inner  product  space  is  a  vector  space  V  with  an  inner  product  (x,  y)  defined 
on  it.  An  inner  product  on  V  is  a  mapping  of  V  x  V  into  M  such  that  for  all 
vectors  x,  y,  z  and  scalars  a ,  f3  we  have 

(i)  (ax  +  /3y,z)  =  a(x,z)  +0(y,z). 

(ii)  ( x ,  x)  >  0,  with  equality  if  and  only  if  x  —  0. 

(iii)  (x,y)  =  (y,x). 

An  inner  product  defines  a  norm  on  X  given  by 

INI  =  \/M'  (9) 

A  Hilbert  space  H  is  a  complete  inner  product  space,  complete  meaning  that 
every  Cauchy  sequence  in  H  has  a  limit  in  H. 
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The  next  three  examples  present  the  Hilbert  spaces  that  will  be  employed 
throughout  this  chapter. 

Example  1  Let  be  a  probability  space  and 

£2(n,f,V)  =  jy  G  {n,F,V)  :  EY2  =  J^Y(ou)2dV(ou)  <  ooj 

the  space  of  square-integrable  random  variables  defined  on  (H,  T  ,V).  For  X,  Y  e 
£2(H,  IF,  V),  the  inner  product  is  defined  by 

(X,Y)  =  E(XY)  =  [  X(u)Y(u)dV(u),  (10) 

Jn 

and  the  norm  is  given  by 

I!*!  =  ^TeY2=  ^Y(u)2dV(uj)Sj1/2 ,  (11) 

by  (9).  It  can  be  easily  verified  that  the  inner  product  defined  by  (10)  has 
properties  (i),  (ii),  and  (in).  The  space  C2{LL,F  ,V)  is  complete  under  this 
norm  (Billingsley,  1995,  p.  243).  Note  that 

Vary  =  ||y-£y||2.  (12) 

Example  2  The  space  is  the  set  of  vectors  x  =  (xi, . . .  ,xn)  in  and 
can  be  made  into  a  Hilbert  space  by  defining  the  inner  product  for  x,yGK" 
as 

n 

(x>y  )  =  Y,xiyi-  (13) 

3= 1 

The  norm  induced  by  (13)  is 

INI  =  >/(x,x)  =  • 

The  space  M"  is  complete  under  this  ?iorm  (Bollobas,  1990,  p.  133). 

Example  3  Consider  independent  random  variables  X,  with  distribution  func¬ 
tion  Ffixf),  1  <  i  <  d,  and  define  F(x)  =  Ilf=i  Ffixf),  for  x  =  (x\, . . .  ,Xd). 
Write  X  =  (X1? . . .  ,Xd),  and  let  f  :  Rd  — >  M  be  a  Borel-measurable  function 
in  C2(dF),  the  space  of  square  integrable  functions  with  respect  to  F.  This 
space  can  be  made  into  a  Hilbert  space  by  defining  the  inner  product: 

(f,g)  =  J  f(x)ff(x)dF(x),  (14) 

for  any  f,  g  G  jC2(dF).  The  norm  induced  by  (14)  is 

ll/ll  =  (/ /(x)2dF(x))  , 
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and  C2(dF)  is  complete  under  this  Jiorm  (Billingsley,  1995,  p.  243). 

The  notion  of  orthogonality  among  elements  lies  at  the  heart  of  Hilbert  space 
theory,  and  extends  the  notion  of  perpendicularity  in  Euclidean  space.  Two 
elements  x ,  y  are  orthogonal  if 


(x,y)  =  0. 


From  here,  there  is  just  one  step  to  the  Pythagorean  theorem: 


Result  1  (Pythagorean  Theorem)  Kreyszig  (1978).  If  x\, ...  ,xn  are  pairwise 
orthogonal  vectors  of  an  inner  product  space  V  then 


Y.P 

i— 1 


£ 


(15) 


Let  us  write 

x±  =  {y  G  V  :  (x,y)  =  0} 
for  the  set  of  orthogonal  vectors  to  x  G  V,  and 

S1  =  {y  G  V  :  (x,y)  =  0,  Wx  G  S} 

for  S  C  V.  Finally,  a  set  (aq, . . .  ,xn)  of  elements  in  V  is  orthogonal  if  all  its 
elements  are  pairwise  orthogonal. 

We  often  work  with  a  subspace  S  of  a  Hilbert  space  H  defined  on  X ,  by  which 
we  mean  a  vector  subspace  of  X  with  the  inner  product  restricted  to  S'  x  S. 
It  is  important  to  know  when  S  is  complete,  and  therefore  a  Hilbert  space.  It 
is  easy  to  prove  that  S  is  complete  if  and  only  if  S  is  closed  in  H. 

Consider  a  non-empty  subset  M  of  an  inner  product  space  V.  Central  to  the 
concepts  discussed  in  this  chapter  is  to  know  when,  given  a  point  y  in  V,  there 
exists  a  unique  point  x  G  M  that  minimizes  the  distance  from  y  to  M,  where 
the  distance  from  y  to  M  is  defined  to  be  d(y,M )  =  inf„eM  \\y  —  v\\.  The 
following  result  provides  an  answer  to  this  problem. 

Result  2  (Projection  Theorem)  Kreyszig  (1978).  Let  M  be  a  complete  sub¬ 
space  of  an  inner  product  space  V ,  and  let  y  G  V  be  fixed.  Then  there  exists  a 
unique  x  =  x(y)  G  M  such  that 

d(y,  M)  =  \\y-x\\. 

Furthermore,  every  y  G  V  has  a  unique  representation  of  the  form 

y  =  x  +  z, 
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Fig.  1.  Orthogonal  Projection 
where  x  G  M  and  z  G  .  Then 

(x,  y  —  x)  =  (x,  z)  =  0.  (16) 


The  second  part  of  Result  2  implies  that  if  M  is  a  complete  subspace  of  an 
inner  product  space  V,  then  there  exists  a  function  Pm  :  V  — >  M  defined  by 
PmV  =  x.  We  call  Pm  the  orthogonal  projection  of  V  onto  M,  see  Figure  1. 

Among  the  properties  that  the  projection  functional  enjoys  are: 

a)  Pmx  =  x,  for  all  x  G  M. 

b)  Pm z  =  0,  for  all  z  G  ML . 

c)  11-^  —  -Pm  1 1  A  1- 

Applying  Result  2  to  Examples  1,  2,  and  3  leads  to  several  variance  reduction 
techniques  in  the  Hilbert  space  setting.  The  following  example  forms  the  basis 
for  the  method  of  conditional  Monte  Carlo. 

Example  4  In  the  setting  of  Example  1,  consider  a  sub-a -algebra  Q  of  T .  The 
set  of  square  integrable  random  variables  defined  on  £2(Q,Q ,V)  is  a  complete 
subspace  of  £2(Q,  T ,V).  Therefore,  forY  G  £2(fl,jF.  V)  fixed,  there  exists  an 
element  Z  G  £2(H,Q,V)  that  is  the  closest  point  to  Y  in  £2(Tl,Q,V)  and  for 
which 

(Y-Z,W)  =  0,  (17) 

for  W  G  C2(Tl,Q,V)  arbitrary.  Choosing  W  =  Ir,  B  G  Q ,  Equation  (17) 
shows  that  Z  is  the  conditional  expectation  of  Y  given  Q,  Z  =  E(Y\Q).  We 
also  can  write  PgY '  =  E(Y\Q);  see  Williams  (1991)  for  more  details. 

Observe  that  Equation  (17)  and  the  Pythagorean  theorem  imply  that 


l|r||2  =  \\y  -  z\\2  +  \\zf. 


(18) 


Using  Equation  (12),  centering  Y  so  that  EY  =  0,  we  have  that  Equation  (18) 
is  the  variance  decomposition  formula  (Law  and  Kelton,  2000): 

Var  Y  =  E  Var(Y  \Q)  +  Var  E(Y\Q),  (19) 

where  Var(Y|£)  =  E(Y2\Q)  -  (E(Y\Q))2. 

We  continue  with  an  example  with  a  view  towards  control  variates. 

Example  5  For  elements  X{, ... ,  Xd  e  C2{Ut,  P ,V)  with  zero  mean  (oth¬ 
erwise  re-define  Xi  :=  X*  —  EXi),  define  M  =  {Z  e  C2(H,P,V)  :  Z  = 
Yfi=i  ffXtl  for  all  Pi  e  M,  i  —  1, . . .  ,d}.  For  Y  e  C2(Ul,P,V),  Result  2  then 
guarantees  the  existence  of  constants  P(  =  Pl(Y), . . . ,  P*d  =  P%(Y)  such  that, 

d  d 

PM(Y-EY)  =  '£ftXi,  and  (I-Pm)(Y-EY)  =  (Y-EY)-J2P*Xl.  (20) 

i= 1  i=  1 

If  Y  —  EY  G  M ,  using  property  a)  of  the  projection  operator  we  obtain  (/  — 
Pm)(Y  —  EY')  =  0 ,  so  that 


Var  (y-X>*w)  =0.  (21) 

If  the  elements  Xi, . . . ,  Xd  form  an  orthogonal  set,  applying  Equation  (15)  we 
have 

||(J  -  Pm)(Y  -  EY) ||2  =  \\Y  -  EY\\ 2  -  || Pm(Y  -  EY) ||2 

=  \\Y-EYf-'tt3f\\Xi\\\ 


so  that 


Var  [Y-Y^PtXi 


1=1 


d 

Var  Y  —  ^2  P*2  Var  Xt. 

i=  1 


(22) 


When  the  elements  X1, . . . ,  Xd  are  not  mutually  orthogonal  but  linearly  inde¬ 
pendent,  the  Gram-Schmidt  process  (Billingsley,  1995,  p.  2f9)  yields  an  or¬ 
thogonal  set  with  the  same  span  as  X\, . . .  ,Xd-  In  case  the  Xi’s  are  linearly 
dependent,  at  least  one  Xi:  can  be  expressed  as  a  linear  combination  of  the  oth¬ 
ers,  and  eliminated  from  the  set  Xi, . . . ,  Xd.  By  noticing  that  Co v(Xi,Xj)  = 
(Xi,  Xj)  we  gather  that  X1? . . . ,  Xd  are  linearly  independent  if  and  only  if  their 
covariance  matrix  is  non-singular,  and  X\, . . .  ,Xd  are  mutually  orthogonal  if 
and  only  if  their  covariance  matrix  has  all  its  entries  equal  to  zero  except  for 
positive  numbers  on  the  diagonal. 

We  can  extend  Example  5  to  the  setting  of  biased  control  variates. 
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Example  6  Let  X  G  £2(fl,  T,V),  and  M  =  {Z  e  jC2(Q,E,V)  :  Z  =  (3(X  - 
7 ),V/?  G  M},  7  7^  EX.  Fix  an  element  Y  G  C2(Ft,  X ,V)  and  let  a  =  EY . 
Project  Y  —  a  on  M: 

Pm{Y  -  a)  =  p*(X  -  7),  and  (/  -  PM)(Y  —  a)  =  Y  —  a  —  /3*(X  —  7), 

for  some  (3 *  =  fd*(Y)  G  M.  As  in  the  last  example  we  have 

||(J  -  Pm)(Y  -  a)||2  =  \\Y  -  a||2  -  || PM(Y  -  a)||2 

and,  since  ((/  —  Pm)(Y  —  a),  X  —  7)  =  0,  it  follows  that 

(Y-a,  jf-7> 
l|A'-7ll2  ' 

The  Pythagorean  theorem  applied  to  the  denominator  in  the  last  equation  yields 

\\X  -  if  =  II*  -  EX ||2  +  || EX  -  7||2,  (23) 

which  is  known  as  the  bias-variance  decomposition  formula. 

The  following  example  is  geared  to  the  method  of  control  variates  when  the 
optimal  control  coefficient  is  estimated  from  the  sample  data. 

Example  7  Let  x  =  (27, . . . ,  xn)  and  y  =  (yi, ... ,  yn)  be  elements  of  Mn  ( cf . 
Example  2),  and  define  S  =  { z  6  1"  :  z  =  fix,  V/3  el}.  By  Result  2  we  have 

Psy  =  finX  and  (/  -  Ps) y  =  y  -  /3nx, 

for  some  (3n  =  (3n{ y).  Because  (y  —  /5nx,  x)  =  0,  for  ||x||  >  0, 


We  now  set  the  stage  for  the  method  of  Latin  hypercube  sampling. 

Example  8  Building  on  Example  3,  let  M  =  {h  G  C2(dF)  :  h(x)  =  Yfl=  1  hi(xf)} 
be  the  subspace  of  C2(dF)  spanned  by  the  linear  combinations  of  univariate 
functions  h±, ...  ,hd-  Because  M  is  complete,  appealing  to  Result  2  establishes 
the  existence  of  an  element  h*  =  h*(f)  G  M,  h*{x)  =  Yfn= 1  K(xi)>  suc h  that 
||/  —  h*  ||  =  inf heM  11/  —  h ||  for  f  G  C2(dF)  fixed,  and  of  a  projection  oper¬ 
ator  PM  :  PMf  =  h* .  Similarly,  for  Mt  =  {h  G  C2(dF)  :  /i(x)  = 

1  <  i  <  d,  there  exists  g*  =  g*(f )  G  A/:  ||/  -  g* ||  =  infheMi  11/  -  ^|| 
a  projection  Pi  :  Pif  =  g* .  To  complete  the  picture,  define  the  subspace 
M0  =  {/3  G  M  :  \/3\  <  cxo}  which  induces  the  projection  Pq  :  Pof  =  g$,  for 
9o  '■  \\f  ~  9o\\  =  inf/3eMo  11/  -  P\\-  We  now  have: 
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-  Let  p;  =  er({K  xRx---xRx5xKX'"XK:5  G  B}),  where  B 

is  the  Borel  cr— field,  and  let  Po  be  the  trivial  cr -algebra  {0,M}.  For  each 
Pi,  we  know  that  (/  —  Ptf,  /q  =  0  for  any  h  G  Mt;  choosing  h(x)  = 
hi(xi)  =  B  G  B,  shows  that  Pjf  =  P(/(X)|Pj)  for  1  <  i  <  d,  and 

Pof  =  P(/(X)|P0)  =  Ef(X). 

-  Suppose  Pof  =  0.  Then  (/  —  Pm )f  G  Mf~  implies  g*  =  Pif  =  PiPuf  = 
h* ,  which  results  in 

d  d 

PM  =  Y,pu  andPMf  =  YJE(f(X)\Pl).  (24) 

i= 1  i=l 

For  general  P0f  ^  0,  (24)  becomes 

Pm  =  Po  +  E(^-Po),  and  PMf  =  P/(X)  +  E(P(/(X)|P,) -Ef{X)). 

i= 1  i= 1 

(25) 

The  next  result,  a  variant  of  Result  2,  will  be  useful  when  we  consider  the 
method  of  weighted  Monte  Carlo. 

Result  3  Kreyszig  (1918)  Suppose  M  0  is  a  closed  convex  subset  of  a 
Hilbert  space  H .  Then  for  x  G  H  fixed,  X\  =  Pm(x)  is  the  (unique)  closest 
point  in  M  to  x  if  and  only  if 


(x  —  Xi,y  —  Xi)  <  0,  Vy  G  M. 


(26) 


Later  in  the  chapter  we  will  deal  with  sequences  of  projections,  say  (Pn), 
defined  on  a  Hilbert  space  H  that  are  monotone  increasing  in  that 

||P^||  <  ||P(+i^||,  for  *  =  1,2,..., 

and  x  G  H  arbitrary.  Using  the  completeness  of  H  it  can  be  shown  that  (Pn) 
converges  in  the  following  sense: 

Result  4  Kreyszig  (1918)  Let  (Pn)  be  a  monotone  increasing  sequence  of  pro¬ 
jections  Pn  defined  on  a  Hilbert  space  H .  Then,  for  any  x  G  H , 

|| Pnx  -  Px ||  -»•  0, 

and  the  limit  operator  P  is  a  projection  on  H . 

An  immediate  application  of  this  result  is  the  following  example. 

Example  9  Suppose  that  {Pn)  is  an  increasing  sequence  Pi  C  JF2  C  . . .  C 
Poo  °f  a -algebras  such  that  p*,  =  a{\Jf(=1lFn) .  Then  associated  with  every 
£2(f2,P„,P)  there  exists  a  projection  Pn  :  £2(U,P,  V)  — >  £2(U,Pn,P),  and 
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the  sequence  of  projections  (. Pn )  is  monotone  increasing.  Let  P, x  be  the  pro¬ 
jection  that  results  from  applying  Result  j:  ||PnW  —  PooW||  — >  0  for  any 
W  G  £2(fl,  P ,  V),  Poo  C  P .  Because  (/  —  Poo)W  G  £2(fl,  Pn,  V)L ,  we  have 

f  WdP  =  f  PocWdP  (27) 

Jb  Jb 

for  any  B  G  Pn.  A  standard  n  —  X  argument  (Durrett,  1996,  p.  263)  shows  that 
(27)  holds  for  B  G  Pa 0  arbitrary;  that  is,  PooW  =  E(W \Poo).  The  conclusion 
is 

\\EiW\Poo)  -  E(W\Pn)\\  ->0,  (28) 

as  n  — >  oo. 


We  will  appeal  to  Example  9  when  dealing  with  stratification  techniques.  A 
variation  of  the  last  example  is 

Example  10  Suppose  that  X  is  random  variable  with  known  and  finite  mo¬ 
ments  EX\i  =  1,2,....  Define  a  sequence  of  complete  subspaces  (Md)  of 
C2(<A,a(X),V)  by 

Md  =  \z  G  £2( n,  a(X),P)  :  z  =  J2 &(Xi  ~  EX^yfc  G  R,  1  <  i  <  d  j  , 

for  d  =  1,2,....  Clearly  Mi  C  M2  C  ...  C  where  =  U^1Mi. 

Associated  with  each  Md  there  is,  by  Result  2,  a  projection  operator  Pd  with 
range  on  Md  such  that  for  W  G  C2{Ll,P,V),  a(X)  C  P ,  with  EW  =  0: 

d 

PdW  =  J2fi(xi-EXi),  (29) 

i= 1 

for  some  constants  (3*  =  /3*(W),  1  <  i  <  d,  possibly  dependent  on  d  (although 
this  is  not  apparent  from  the  notation).  Because  the  sequence  of  operators  (Pd) 
is  (clearly)  monotone  increasing.  Result  f  ensures  the  existence  of  a  projection 
Poo  in  £2(fl,  cr(X),  V)  that  satisfies 


PooW  -  PdW ||  -  0, 


(30) 


as  d  — >  oo.  Proceeding  like  in  the  last  example  it  follows  that  PooW  =  E{W\X), 
for  W  G  C2(D,P,  V)  arbitrary.  In  other  words. 


1=1 


as  d  — >  oo,  by  Equations  (29)  and  (30). 


(31) 


We  will  use  the  last  example  in  Section  6  to  show  how  conditional  Monte  Carlo 
and  control  variates  reduce  variance  cooperatively. 
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The  rest  of  the  chapter  is  devoted  to  provide  an  interpretation  of  this  section’s 
examples  in  terms  of  variance  reduction  techniques.  We  start  with  the  method 
of  control  variates. 


4  A  Hilbert  Space  Approach  to  Control  Variates 


We  build  on  the  sequence  of  examples  from  the  previous  section;  Glynn  and 
Szechtman  (2002)  is  a  relevant  reference  for  the  issues  discussed  in  this  section. 

Consider  the  setting  of  Example  5:  Ad, ... ,  Ad  are  zero-mean  square  integrable 
random  variables  with  non-singular  covariance  matrix  Exx  (although  this  was 
not  needed  in  Example  5),  and  defined  on  the  same  probability  space  as  the 
response  Y,  EY 2  <  oo.  The  goal  is  to  estimate  a  =  EY  by  averaging  n 
i.i.d.  replicates  of  Y  —  Ylf=\  X,Xt  to  obtain  Ycv{ A)  as  in  Equation  (1).  Clearly, 
Var  Ycv (A)  =  1/n  Var (Y  -  £?=1  A* Xt) . 

From  Example  5,  Ycv{ A*)  is  the  remainder  from  projecting  Y  on  M;  as 
M  “grows”  the  norm  of  the  remainder  decreases.  Also,  because  the  scalars 
Xl,...,\*d  that  minimize  Var(V  —  £f=1  A* A*)  are  also  the  numbers  that  result 
from  projecting  Y  into  M,  from  Result  2  and  Equation  (16)  we  know  that 

d 

(y-^\%z)  =  o,vzeM. 

i=  1 

In  particular, 

d 

(V-E  KXk)  =  0,  for  k  =  1, . . . ,  d.  (32) 

i= 1 

Therefore,  since  Cov(Y, Xj)  =  ( Y,Xj ),  and  Co v(Xi,Xj)  =  (Xt , X3 )  for  1  < 

i,j  <  d, 

K  =  (SxxSxy)j,  for  i  =  1, . . . ,  d, 

in  concordance  with  Equation  (3).  When  Xl} ,  Xd  is  an  orthogonal  set, 
Equation  (32)  yields 


{Y,Xj) 

{Xi,Xi) 


Cov(V,  Xi) 
Var  Xi 


1, . . . ,  d. 


(33) 


We  re-interpret  the  results  from  Examples  5  through  7  in  the  CV  context: 
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a)  Va rVcv(A*)  <  Vary  because 


Var  (y  -  J2  KxiJ  =  || (/  -  Pm)(Y  -  EY) ||2,  by  Equation  (20) 

<  \\I  -  PMf\\Y  -  EYf 

<  Var  y, 


by  using  the  projection  operator  properties  of  the  last  section. 

b)  If  Y  can  be  expressed  as  a  linear  combination  of  X\, . . . ,  Xn  then  Var(y  — 

i  A iXi)  =  0  for  some  Ai, . . . ,  Ad',  this  is  Equation  (21). 

c)  If  the  controls  X  i . . . . .  Xd  are  mutually  orthogonal,  then 


Va  r  V 


d 

E 

i=  1 


A,X 


d 

Var  y  —  E 

i=  1 


Cov(y,  x)2 
VarX 


VarK  -  fi) 


5 


by  Equations  (22)  and  (33),  where  pyXi  is  the  correlation  coefficient  be¬ 
tween  Y  and  X,  i  =  1,  •  •  • ,  d. 

d)  With  biased  control  variates  in  mind,  apply  Example  6  to  the  elements 
(F  —  a)  and  (X  —  7)  to  get  the  optimal  BCV  coefficient 

An  =  Cov(y,X)/£(X-7)2, 


as  expected  from  (8).  That  is,  BCVs  as  presented  in  Section  2  arise  from 
taking  the  remainder  of  the  projection  of  Y  —  a  on  the  span  of  X  —  7. 
Because  of  (23)  we  have 

Var Ycv(A*)  <  MSEyBCV'(A„). 


e)  Consider  the  setting  of  Example  7:  There  exists  a  zero-mean  control  vari¬ 
ate  X  G  M,  and  the  output  of  the  simulation  are  the  sample  points 
y  =  (Vi,  ■■■ ,Vn )  and  x  =  (aq, . . . ,  xn).  Let  y  =  (y1  -  y, . . . ,  yn  -  y),  and 
define  the  estimator 

1  n 

Ycvi.An)  —  y  ](Vj  A nXj), 

n  3= 1 

where  An  =  (x, y)/||x||2.  From  Example  7  we  know  that  An  arises  from 


14 


projecting  y  on  the  span  of  x:  Psy  =  Anx.  Now,  the  sample  variance  is 
1  "  .  1 

-  y(y,  -  \nXj  -  Ycv(\n))2  =  -||(7  -  Ps)y||2  -  (Ycv( A„)  -  y)2 
n  n 

=  i(l|y||2-A;i|x||  2)+0(n-‘) 

=  -|ly||2(l  -4.x)  +  0(n_1), 

n  * 

where  Py,x  =  (x,  y)/||x||  ||y  ||,  which  makes  precise  the  variance  reduction 
achieved  by  projecting  y  on  the  span  of  x  relative  to  the  crude  estimator 
sample  variance  ||y||2/n. 

Finally,  we  remark  that  there  is  no  impediment  in  extending  items  a)  through 
e)  to  the  multi-response  setting,  where  Y  is  a  random  vector. 


5  Conditional  Monte  Carlo  in  Hilbert  Space 


In  this  section  we  address  the  method  of  conditional  Monte  Carlo,  paying 
special  attention  to  its  connection  with  control  variates;  we  follow  Avramidis 
and  Wilson  (1996),  and  Loh  (1995). 

Suppose  Y  G  C2{yi,P,V)  and  that  we  wish  to  compute  a  =  EY\  Let  A"  e 
P.V)  be  such  that  E(Y\X)  can  be  analytically  or  numerically  com¬ 
puted.  Then 

1  n 

YCMC=-Y,E(Y\Xi) 

nJ= 1 

is  an  unbiased  estimator  of  a,  where  the  E(Y\Xj)  are  found  by  first  obtaining 
i.i.d.  samples  Xt  and  then  computing  E(Y\Xi).  We  call  Ycmc  the  conditional 
Monte  Carlo  (CMC)  estimator  of  a;  remember  that  according  to  Example  4, 
Ycmc  results  by  projecting  Y  on  £2(h2,  cr(X),  V).  The  variability  of  Ycmc  is 
given  by 

Var  Ycmc  =  —  Var  £,(T'|X), 
n 

with  Equation  (19)  implying  that  Var  Ycmc  <  Var  Y .  Specifically,  CMC  elim¬ 
inates  the  E  Var(V | X)  term  from  the  variance  of 

Sampling  from  Y  —  A (Y  —  E(Y\X ))  also  provides  an  unbiased  estimator  of  a 
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for  any  A  6  R.  By  Equation  (3), 


(Y}(I-Pa{x])Y) 

||  (I-Pa{x))Yr 

(Pg(X)Y  +  (/  -  P<X))Y}  (/  -  Pa[x])Y)  (34) 

\\(I  -  Pa{x))Y\\2 


This  shows  that  CMC  is  optimal  from  a  CV  perspective.  Avramidis  and  Wil¬ 
son  (1996),  and  Loh  (1995),  generalize  this  approach:  Let  Z  be  a  zero-mean 
random  variable  in  £2(f2,  E,  P),  and  X  a  random  variable  in  £2(11,JF,  V)  for 
which  both  E(Y\X )  and  E(Z\X)  can  be  determined.  Then  sampling  from 

Y  WAi(y  -  E(Y\X))  -  X2E(Z\X)  -  X3Z  (35) 


can  be  used  to  form  the  standard  means  based  estimator  for  a,  for  all  Ai,  A2,  A3  € 
M.  Repeating  the  logic  leading  to  (34)  we  obtain 


A 


* 

1 


i,a; 


Cov(E(Y\X),E(Z\X)) 

VaiE(Z\X) 


and  A3 


0. 


The  conclusion  is  that, 


var  Urm  -  gmmiE(Z|X) 


X&r  E(Z\X) 


<  < 


Var  Y, 

XaiE(Y\X), 

Var  {Y  - 
Var  (Y  ~  ^WE(Z\X))  , 

[  Var  (E(Y\X)  - 


In  particular,  Loh  (1995)  considers  the  case  of  Z  =  X  almost  surely  in  (35), 
and  Avramidis  and  Wilson  (1996)  fix  Ai  =  1  and  A3  =  0  in  (35).  From  the 
norm  perspective, 

II  E(Y\X)  -a-  \*2E(Z\X)\\2  =  ||  v  -  a||2  -  ||  v  -  E(Y\X)\\2  -  ||A^(Z|V)||2 


makes  precise  the  variance  eliminated  when  sampling  from  E(Y |AT)  — X^E^ZlX). 
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6  Control  Variates  and  Conditional  Monte  Carlo  from  a  Hilbert 
Space  Perspective 


We  now  discuss  how  CMC  and  CV  can  be  combined  to  reduce  variance  co¬ 
operatively;  the  results  of  this  section  appear  in  Loh  (1995),  and  Glynn  and 
Szechtman  (2002). 

Suppose  the  setting  of  Example  10:  There  exists  a  random  variable  X  e 
C2(Xl,  E,V)  such  that  the  moments  EX1  are  known  with  E|A"|*  <  oo  for  all 
i  =  1,2,....  Given  a  random  variable  Y  e  E.V),  the  goal  is  to  find 

a  =  EY  by  running  a  Monte  Carlo  simulation  that  uses  the  knowledge  about 
the  moments  of  X  to  increase  simulation  efficiency.  Suppose  we  can  sample 
from  either 

a)  E{Y\X)  -ZUKi**  -EX*) 
or 

b)  V-EtiAI(W-EW), 

to  form  the  standard  estimator  for  a,  where  the  A*  are  determined  by  applying 
Equation  (3)  on  E(Y\X)  and  on  the  controls  (X1  —  EX1, . . . ,  Xd—EXd).  From 
the  developments  of  Example  10  it  is  a  short  step  to: 

a)  Take  W  =  E{Y\X)  —  a  in  Equation  (31),  consequently: 

Var  (e(Y'|V)  -  J2  X*(Xi  -  EX*)^  0,  (36) 


as  d  — ■»  cx). 

b)  The  triangle  inequality  and  W  —  Y  —  a  in  Equation  (31)  result  in 


Var  [Y  -  J2  A*  (X*  -  EX*)  -»•  E  Var(V|V), 


(37) 


i= 1 


as  d  — >•  cx). 

The  interpretation  of  a)  is  that  CV  and  CMC  reduce  variance  concurrently: 
E(Y\X)  eliminates  the  E  Var(F|V)  part  of  Var  Y,  while  £?=i  X*(Xl  -  EX1) 
asymptotically  cancels  Yax  E(Y\X).  The  effect  of  Ef=i  A*(Xl  —  EX1)  in  part 
b)  is  to  asymptotically  eliminate  the  variance  component  due  to  E(Y\X)  when 
using  E(Y\X)  in  the  simulation  is  not  possible. 
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7  Weighted  Monte  Carlo 


In  this  section  we  consider  the  asymptotic  behavior  of  weighted  Monte  Carlo 
(WMC)  estimators,  for  a  large  class  of  objective  functions.  We  rely  on  Glasser- 
man  and  Yu  (2005),  and  Glasserman  (2004),  which  make  precise  the  connec¬ 
tion  between  WMC  and  CVs  for  separable  convex  objective  functions.  Initial 
results,  under  weaker  assumptions  and  just  for  one  class  of  objectives  were  ob¬ 
tained  in  Szechtman  and  Glynn  (2001),  and  in  Glynn  and  Szechtman  (2002). 
Applications  of  weighted  estimators  to  model  calibration  in  the  finance  con¬ 
text  are  presented  in  Avcllaneda  et  al.  (2001),  and  in  Avcllaneda  and  Gamba 
(2000). 

Consider  the  standard  CV  setting:  (Yi,  Xi), . . , ,  (Yn,  Xn)  are  i.i.d.  samples 
of  jointly  distributed  random  elements  ( Y ,  X)  e  (M,  Md)  with  non-singular 
covariance  matrix  S  and,  without  loss  of  generality,  AX  =  0  componentwise. 
The  goal  is  to  compute  a  =  EY  by  Monte  Carlo  simulation,  using  information 
about  the  means  AX  to  reduce  estimator  variance.  Let  /  :  M  — »  M  be  a  strictly 
convex  and  continuously  differentiable  function,  and  suppose  that  the  weights 

m*  m* 

^1,71?  *  *  *  ?  ^n,n 

n 

minimize  E  f(wk,n)  (38) 

k= 1 

1  n 

subject  to  —  E  «>*,»  =  1  (39) 

n  k=  1 

1  n 

-  E  wk,n^k  =  0.  (40) 

Then,  the  WMC  estimator  of  a  takes  the  form 

1  n 

Y\VMC  =  -  E  Wk,vXk- 


The  following  observations  review  some  key  properties  of  WMC:  The  weight 
applied  to  each  replication  i  is  w*n/n,  rather  than  the  weight  1/n  used  to  form 
the  sample  mean  Y.  A  feasible  set  of  weights  is  one  that  makes  Ywmc  unbi¬ 
ased  (cf.  constraint  (39)),  and  that  forces  the  weighted  average  of  the  control 
samples  to  match  their  known  mean  (cf.  constraint  (40)).  For  every  n  suffi¬ 
ciently  large  0  (=AX)  belongs  to  the  convex  hull  of  the  replicates  Xi, . . . ,  Xn, 
and  therefore  the  constraint  set  is  non-empty.  The  objective  function  in  (38), 
being  strictly  convex,  ensures  uniqueness  of  the  optimal  solution  if  the  optimal 
solution  is  finite.  If  wk)n  >  0, 1  <  k  <  n,  were  additional  constraints,  a  feasi¬ 
ble  set  of  weights  WitU, . . . ,  wn,n  would  determine  a  probability  mass  function 
l/nELi  5xfc(-)wM>  where  <5x(z)  =  1  if  z  =  x  and  is  equal  to  zero  otherwise. 
However,  as  discussed  in  Hesterberg  and  Nelson  (1998),  P(wkn  <  0)  =  o(n~p ) 
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uniformly  in  1  <  k  <  n  if  i?(||X||p)  <  oo  indicates  that  the  non- negativity 
constraints  are  asymptotically  not  binding;  see  Szechtman  and  Glynn  (2001) 
for  an  example  of  this  scenario. 

There  are  different  /’ s  depending  on  the  application  setting.  For  example: 
/( w)  =  —  logic  results  in  maximizing  empirical  likelihood;  discussed  in  Szecht¬ 
man  and  Glynn  (2001).  The  function  f{w )  =  —w  log  w  yields  an  entropy  max¬ 
imization  objective;  this  is  the  subject  of  Avellaneda  and  Gamba  (2000),  and 
Avellaneda  et  al.  (2001).  The  important  case  of  f[w )  =  w 2  is  considered  next, 
the  optimization  problem  being  to 


n 

minimize  ^  w\  n 
k=  1 


1  n 

subject  to  -  ^  wk,n  =  1  (41) 

n  k=  i 

1  n 

^  '  U>k,nX-k  =  0. 

Solving  the  optimization  problem  yields  (Glasserman  and  Yu,  2005)  optimal 
weights  given  by 

w*k  n  =  1  -  XTM-1(Xfe  -  X),  for  k  =  1, ... ,  n,  (42) 

where  M  G  Mdxd  is  the  matrix  with  elements  Mt  ^  =  1  jn  J2k=i(Xi  k~ X){Xj  k  — 
xy,  m-1  exists  for  all  n  large  enough  because  M  — >  Sxx  a.s.  componentwise. 
Rearranging  terms  immediately  gives 

1  n 

-  E  =  Ycv(Xn),  (43) 

"tei 

with  the  benefit  that  the  optimal  weights  do  not  depend  on  the  Yk,  which 
makes  this  approach  advantageous  when  using  CVs  for  quantile  estimation; 
see  Hesterberg  and  Nelson  (1998)  for  details. 

Regarding  Hilbert  spaces,  consider  the  space  M”  (cf.  Example  2)  and  the  set 

{j  n  2  n  1 

W  (n)  =  (w  ln, wn  n)  G  Mn  :  —  V  wk,n  =  1  and  -  V  wk:UKk  =  0 

n  n  ^ 

It  can  be  verified  that  A{n )  meets  the  conditions  of  Result  3  for  every  n  suffi¬ 
ciently  large,  and  consequently  it  is  fair  to  ask:  What  element  in  A{n )  is  closest 
to  l(n)  =  (1, . . . ,  1)  G  Mn?  That  is,  which  element  w*(n)  =  . . . ,  w*i  n)  G 

Mn 

minimizes  ||l(n)  —  w(n)|| 
subject  to  w(n)  G  A(n)? 
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This  problem  yields  the  same  solution  as  problem  (41);  doing  simple  algebra 
it  is  easy  to  verify  that  w *(n)  with  components  as  in  Equation  (42)  satisfies 
condition  (26).  The  conclusion  is  that  w *(n)  is  the  closest  point  in  A{n)  to 
l(n),  the  vector  of  crude  sample  weights.  Would  this  Hilbert  space  approach 
to  WMC  work  with  /’ s  that  are  not  quadratic?  Yes,  as  long  as  the  metric 
induced  by  the  inner  product  meets  the  defining  properties  of  a  metric. 


The  main  result  concerning  WMC  and  CV,  proved  in  Glasserman  and  Yu 
(2005)  under  certain  conditions  on  X,  Y,  /,  and  the  Lagrange  multipliers 
associated  with  constraints  (39)  and  (40),  is  that 

Ywmc  =  Ycv  +  ^(ttT1), 


and, 

VnfiwMC  ~~  a)  ^  N (0,  &WMc)i 


as  n  — >  oo  where 


2  2 
aWMC  =  aCVi 

and  Op(an )  stands  for  a  sequence  of  random  variables  (£n  :  n  >  1)  such  that 
for  all  e  >  0  and  some  constant  5,  P(|£n|  >  anS )  <  e.  The  last  result  provides 
support  to  the  statement  that  YWMc  and  YCy  are  asymptotically  identical. 


8  Stratification  Techniques 


In  this  section  we  discuss  stratification  methods  emphasizing  the  connection 
with  the  Hilbert  space  and  CVs  ideas  already  developed.  Refer  to  Fishman 
(1996),  Glasserman  et  al.  (1999),  and  Glynn  and  Szechtman  (2002)  for  more 
details. 


Suppose  that  we  wish  to  compute  a  =  EY,  for  some  random  variable  Y  G 
C2(fl,  P.V).  Let  X  G  C2(fl,P,  V).  The  method  of  stratification  arises  when 
there  is  a  collection  of  disjoint  sets  (“strata”)  (At  :  1  <  i  <  d)  in  the  range  of 
X  such  that  P(X  e  Uf=1Ai)  =  1  and  P(X  e  Ai)  =  Pi  is  known  for  every  1  < 
i  <  d.  Then,  assuming  that  one  can  obtain  i.i.d.  replicates  (Yjy.  :  1  <  k  <  rii) 
from  P(Y  G  -\X  G  Ai),  1  <  i  <  d,  the  estimator  of  a  given  by 


EjrEn. 

6=1  Ul  k= 1 


(44) 


is  unbiased,  where  n,  is  the  number  of  replicates  sampled  from  P(Y  G  -\X  G 

Ai). 

For  a  total  number  of  replications  n  =  Y)f=\  ni>  proportional  stratification 
allocates  nl  =  npi  samples  to  strata  Ai,  1  <  i  <  d,  where  for  simplicity  we 
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assume  that  the  rip,  are  integers.  The  estimator  of  Equation  (44)  is  then  called 
the  proportional  stratification  (PS)  estimator 


i  d  up, 

Yps  =  -  12  Yi,k, 

n  U. 


with  variance  given  by 


1  d 

VaxYps=-'ZlPiVai(Y\Z  =  i) 

n  i=  i  (45) 

=  -EVai(Y\Z), 
n 

where  the  random  variable  Z  =  J^i= i  H(X  G  A,). 

One  implication  of  Equation  (45)  is  that  if  Y  is  not  constant  inside  each 
strata,  then  Var  Yps  >  0,  so  that  proportional  stratification  does  not  eliminate 
the  variability  of  Y  inside  strata,  but  rather  the  variability  of  E(Y\Z)  across 
strata.  In  addition,  Equation  (45),  jointly  with  the  variance  decomposition 
formula  (19),  quantifies  the  per-replication  variance  reduction  achieved  by 
proportional  stratification:  EVtir(Y\Z)  =  Vary  —  Var E(Y\Z).  Observe  that 
although  PS  is  relatively  simple  to  implement,  it  does  not  provide  the  optimal 
sample  allocation  n,  per  strata;  see  Glasserman  (2004,  p.  217)  for  more  details. 

From  a  CV  perspective,  PS  acts  like  applying  E(Y\Z)  —  a  as  a  CV  on  Y ; 
Yps  achieves  the  same  variance  reduction  as  that  obtained  by  averaging  i.i.d. 
replications  of  Y  —  (E(Y\Z)  —  a).  Of  course,  sampling  from  the  distribution 
of  V  —  (E(Y\Z)  —  a)  is  impractical  because  a  is  unknown. 

Regarding  Hilbert  spaces,  Equation  (45)  is  simply 

nYaxYps  —  \\(I  —  Pa(z))Y\\2.  (46) 

In  addition,  Y'p$  satisfies  the  following  CLT: 

n1/2(yP5  —  a)  N(0,cr'pS )  as  n  — >  oo, 

where  a2PS  =  EVax(Y'\Z),  which  enables  the  construction  of  asymptotically 
valid  confidence  intervals  for  a. 


Post-stratification  offers  an  alternative  to  proportional  stratification  when 
sampling  from  P(Y  E  -\X  G  A,)  is  not  possible,  but  when  it  is  possible  to 
sample  from  the  distribution  of  (X,  Y).  Specifically,  we  construct  the  unbiased 
estimator 


Ypst  = 


G  ELi  Yp(Xk  e  A) 

hY  E”.,/(VeA) 


21 


Using  the  Delta  method  (cf.  Section  2)  it  is  easy  to  prove  a  CLT  for  Ypst- 

n1/2(YpST  -a)  ^  N(  0,  a2pST ), 

as  n  — >  oo,  where  crpST  =  EYar(Y\Z).  However,  for  every  stratum  we  know 
a  priori  that  EI( X  G  Ai)  =  Pi,  which  suggests  the  use  of  the  vector  ( I(X  G 
Ai)  —  Pi  :  1  <  i  <  rf)  as  a  control.  More  specifically,  an  unbiased  CV  estimator 
is  given  by 


1  n  (  d  \ 

Ycv( A)  =  -  E  U  -  E  M  AV  e  A)  -  Pi)  ■ 

n j= i  V  i=i  / 

Using  Equation  (3),  the  optimal  coefficients  A*,l  <  %  <  d  are  immediately 
found  to  be 

A*  =  E(Y\Z  =  i ),  for  1  <  i  <  d. 

That  is, 

1  11 

Ycv(  \')=-'E(Yi~(E(Y\Zj)-a)), 

“i=  i 

and 

VarlWA*)  =  -EVar(Y\Z). 
n 

Therefore,  n(Var  Ycv(X*)  —  Var Ypst)  ->  0  as  n  ->  oo.  With  a  little  more 
effort,  it  can  be  shown  that 

n1/2  (Ypst  ~  YCv{^*))  =>•  0, 

as  n  — >  oo.  In  other  words,  l^sr  and  Vcv(A*)  have  the  same  distribution  up 
to  an  error  of  order  op(n-1/2),  as  n  -a-  oo;  where  op(an)  denotes  a  sequence  of 
random  variables  (£n  :  n  >  1)  such  that  a”1^  0  as  n  — >  oo. 

Given  the  strata  :  1  <  i  <  d),  it  is  always  possible  to  find  finer  strata  that 
further  reduce  estimator  variance.  In  the  case  of  proportional  stratification, 
suppose  that  it  is  possible  to  split  each  stratum  A,  into  integer  nt  =  npi  strata 
(A,*  :  1  <  k  <  rij)  such  that  P( X  G  Aj,^)  =  1  /n;  i.e.,  the  bivariate  random 
vector  Vn  =  X)f=1  J2kL i(b  k)I(X  G  Aitk)  is  uniformly  distributed  on  the  lattice 
{(i,k)  '■  1  <  i  <  d,  1  <  k  <  rii}.  Assume  in  addition  that  it  is  possible  to 
sample  from  P(Y  G  -\X  G  A%^)-  Then  the  refined  proportional  stratification 
(rST)  estimator  is 

i  d  npi 

YrST  =  —  X!  5Z 

where  the  Ytk  are  sampled  from  P(Y  G  -\X  G  Aiyk)-  Proceeding  as  in  (45),  we 
arrive  at 

Var  YrST  =  1  AVar(V|Un). 
n 
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The  fact  that  \\E(Y\Vn)  -  EY  ||2  =  \\E(Y  \Vn)  -  E(Y\Z)f  +  \\E{Y\Z)  -  EY  ||2 
shows  that  Yar  E(Y\Vn)  >  Var E(Y\Z),  and  therefore 

Var  Yrsr  <  Var  Yps- 


With  regards  to  Example  9,  the  conditions  leading  to  Equation  (28)  apply,  so 
that  as  n  — >  oo 

Yai(E(Y\X)  -  E(Y\Vn))  -»•  0, 

and 

Var(y  -  E(Y \Vn))  ->  E  Var(V|X). 

In  particular,  nYaxYrsr  — >  £'Var(y|V).  This  result  should  come  as  no  sur¬ 
prise  because  as  n  grows  we  get  to  know  the  full  distribution  of  X,  not  unlike 
the  setting  of  Equations  (36)  -  (37):  rST  presumes  knowledge  of  an  increasing 
sequence  of  cr-algebras  that  converge  to  cr(X),  whereas  in  Equations  (36)  - 
(37)  we  have  information  about  the  full  sequence  of  moments  of  X. 


As  to  control  variates,  rST  produces  the  same  estimator  variance  as  the  stan¬ 
dard  CV  estimator  formed  by  i.i.d.  sampling  from  Y  —  (E(Y\X)  —  a),  as 
n  — >  oo.  Similar  to  (46),  we  can  write  nYaiYrST  — >  ||(/  —  Pa(x))Y ||2,  as 
n  — >  oo.  Finally,  the  CLT  satisfied  by  Yrsr  is 

n1/2(W5T-«)^iV(0,ar25T), 
as  n  — >  oo,  where  rfsT  =  EXax{Y\X). 


To  conclude  this  section,  we  mention  the  link  between  post-stratification  and 
WMC.  Write 


w 


* 

k,n 


X  r/( Xt  g  Aj) 


(47) 


for  1  <  k  <  n,  then  the  WMC  estimator  YWMC  =  Y7k=\  wt  rXk  equals  Ypst ■  It 
can  be  confirmed  that  the  weights  given  in  Equation  (47)  are  the  solution  of  the 
optimization  problem  with  objective  function  min  Y7k=\  n  an(I  constraints 
YX=iwk,nI(Xk  E  Ai)  =  Pi,  1  <  i  <  d,  and  J2k=iwk,n  =  1-  Interpreting  at  face 
value,  w*  with  elements  as  in  (47)  is  the  closest  point  in  the  set  determined 
by  the  constraints  to  the  vector  consisting  of  n  ones. 


9  Latin  Hypercube  Sampling 


We  now  discuss  the  method  of  Latin  hypercube  sampling  (LHS)  from  a  Hilbert 
space  and  CV  perspective.  McKay  et  al.  (1979),  Stein  (1987),  Owen  (1992), 
and  Loh  (1996)  are  standard  references  for  LHS.  We  rely  on  Mathe  (2000), 
which  gives  a  good  account  of  LHS  from  a  Hilbert  space  point  of  view. 
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Avramidis  and  Wilson  (1996)  is  also  a  valuable  reference  for  the  issues  we 
consider. 

Suppose  the  setting  of  Examples  3  and  8:  We  have  mutually  independent 
random  variables  X1, . . , ,  Xd}  each  with  known  distribution  function  Ft,  and 
the  goal  is  to  compute 

a  =  Ef{X)  =  j  /(x)dF(x) 

via  simulation,  where  /  :  — »  M  is  a  square  integrable  function  with  respect 

to  F(x)  =  Ylnd=i  Fi(Xi),  x  =  (xu  . . .  ,xd),  and  X  =  (Ah, . .  .,Xd). 

LHS  generates  samples  of  X  as  follows: 

(i)  Tile  [0,l)d  into  nd  hypercubes  A . =  nf=i[^1,  y),  k  = 

i  =  1, . . . ,  d,  each  of  volume  n~d. 

(ii)  Generate  d  uniform  independent  permutations  (tti (•) , . . . ,  7rd(-))  of  {1, ... ,  n}. 

(iii)  Use  the  output  of  (ii)  to  choose  n  hypercubes  from  (i):  Ani(k)t..^d{k)  for 

the  k'  th  tile,  k  —  ri. 

(iv)  Uniformly  select  a  point  from  within  each  A and  generate  Xiik 
by  inverting  Ft  at  that  point. 

Notice  that  (i)  -  (iv)  are 

Xl>k  =  F-1  ~  Ui(k)^j  ?  i  <i<d,  and  1  <  k  <  n,  (48) 

where  the  Ui(k)  are  i.i.d.  uniform  on  [0, 1].  The  LffS  estimator  is  the  average 
of  the  n  samples  /(Xfc),  each  Xfc  =  (Ah  k, . . . ,  Xdk)  obtained  according  to 
(48): 

1  n 

YlHS  =  ~ 

As  in  refined  stratification,  given  a  sample  size  n,  LHS  assigns  one  sample  to 
each  strata  Al)k  given  by 

Ai,k  =  F-1  ^r1  g)),l<i<d,l<ifc<n, 

with  the  sample  uniformly  distributed  within  the  strata.  Where  refined  pro¬ 
portional  stratification  applied  to  a  particular  Xt  asymptotically  eliminates 
the  variance  due  to  U(/(X)|Jh)  (cf.  Example  8  for  the  definition  of  F,)  along 
just  one  dimension  i,  LHS  asymptotically  eliminates  Var  F(f(X)\Fi)  at 
the  same  rate  used  by  rST  used  to  eliminate  only  Var E(f  (X)\Fi). 

Figure  2  illustrates  LHS  with  d  —  2,  n  —  4.  Each  dot  in  the  lower  left  square  is 
a  sample  from  (7 r* (A:)  —  1  +  Ui(k))/n.  The  position  of  each  dot  within  a  square 
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Fig.  2.  Latin  hypercube  sampling 

is  uniformly  distributed  according  to  the  Ut(k)]  the  permutations  VTj(/c)  ensure 
that  there  is  just  one  dot  per  row  and  per  column.  The  lower  right  region 
depicts  F\  and  the  four  equiprobable  strata  for  X\,  with  one  sample  point 
Fr'dmik)  —  1  +  U\(k))/n)  per  strata  the  final  output  are  the  samples 
Xiti, . . .  In  the  upper  left  F2  is  pictured  with  the  axes  inverted;  it  has 

the  same  explanation  as  that  of  F\ ,  with  the  final  output  being  the  samples 
X2li i  ■  ■  ■  i  X2±. 

Stein  (1987)  demonstrates  that  when  /  e  C2(dF), 

VmYlhs=  ^Var  ^/(X)-f:^(/(X)|^  +o(n~l),  (49) 

as  n  — >  oo,  which  makes  precise  the  variance  reduction  achieved  by  LHS,  up 
to  order  o(n-1). 

The  CLT  satisfied  by  YLHS,  proved  in  Owen  (1992)  under  the  condition  that 
/(F_1(-))  is  bounded  on  [0,  l]d  is 

n1/2(YLHS  -  a)  =>  N (0,  a\HS),  (50) 
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as  n  — >  oo,  where 


^«s  =  Varf/(X)-f:B(/(X)|^) 

V  i= i 

Equation  (50)  provides  theoretical  support  for  the  construction  of  a  valid 
confidence  interval  for  a,  whose  width  depends  on  ctlhs ■  This  term  is  generally 
not  known  prior  to  the  simulation,  nor  easily  estimated  from  the  simulation 
data.  Section  3  of  Owen  (1992)  deals  with  the  estimation  of  o2LHS  from  LHS 
data,  and  shows  how  to  use  the  LHS  samples  to  find  an  estimator  for 
which  is  within  n-1/2  in  probability  of  (T2lhs  as  n  — >  oo.  This  permits  the 
formation  of  an  asymptotically  valid  confidence  interval  for  a. 

As  to  standard  Monte  Carlo,  Owen  (1997)  proves  that  LHS  is  no  less  efficient 
because 

VarY^s  <  — 

n 

for  all  n  >  2  and  d  >  2.  In  other  words,  even  if  the  variance  eliminated  by  LHS, 
Var  J2i=i  is  small,  LHS  with  n  samples  is  no  less  efficient  than 

standard  Monte  Carlo  with  n  —  1  replications.  Of  course,  the  only  measure  of 
efficiency  in  this  argument  is  variance,  a  more  complete  analysis  would  take 
into  account  the  computational  cost  of  generating  sample  variates. 

Example  8  leads  into  the  Hilbert  interpretation  of  LHS.  In  particular,  Equa¬ 
tions  (25)  and  (49)  imply 

hm  nVaxYLHS  =  Var  (/(X)  -  f>(/(X)  1^)1  =  W  ~  Pu)f  l|2-  (51) 

That  is,  LHS  takes  the  remainder  from  projecting  /  on  the  subspace  M  deter¬ 
mined  by  the  span  of  the  linear  combinations  of  univariate  functions.  Using 
the  last  equation,  LHS  eliminates  variance  because 

Var/(X)  =  ||(/-P0)/||2 

=  || PM(I  -  P0)f\\2  +  ||(I  -  Pm)(I  -  Po)f\\2 

>  || (J-PM)(/-P0)/f 

=  11  (I-PM)f\\2, 


where 


II  Pm(I  -  Po)f\\2  =  E  II  Pii1  -  Po)f\\2  =  it  VarE(/(X)|  Pt) 

i= 1  i= 1 

is  the  variance  eliminated  by  LHS.  If  /  e  M,  then  Equation  (51)  also  shows 
that  n  Var  Ylhs  — >  0  as  n  — >  oo;  in  other  words,  LHS  asymptotically  eliminates 
all  the  variance  of  /  if  /  is  a  sum  of  univariate  functions. 
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Much  more  can  be  said  about  Var  /(X).  Suppose  for  simplicity  that  the  Xt  are 
Uniform  [0, 1]  random  variables,  so  that  a  =  Jg1  f(x)dx.  Let  u  C  {1,2,...,  cZ} 
and  define  dx~u  =  Yl,gu  d'xr  Then,  if  /  is  square  integrable,  there  is  a  unique 
recursion 

/u(x)  =  f  /(x)dx~u  -  ^  /„(x)  (52) 

J  vCu 

with  the  property  that  Jq  fu(x)dxj  =  0  for  every  j  6  u,  and 

/(x)  =  E  /«(x); 

uC{l,2,...,d} 

see,  for  example,  Jiang  (2003)  for  a  proof.  Recursion  (52)  actually  is  the  Gram- 
Schmidt  process,  and  it  splits  /  into  2d  orthogonal  components  such  that 

J  /u(x)/u(x)rfx  =  0, 

for  and 

=  E  <  (53) 

|u|>0 

where  a2  =  /(/(x)  —  a)2dx  is  the  variance  of  /(X),  and  cr2  —  f  f2(x)dx  is 
the  variance  of  /U(X).  The  conclusion  in  this  context  is  that  LHS  eliminates 
the  variance  of  the  fu  for  all  u  with  |tt|  =  1.  The  setting  of  this  paragraph  is 
known  as  functional  AN OVA;  see  Chapter  13  for  more  details  on  this  topic. 

Considering  control  variates,  we  can  say  that  the  estimator  formed  by  averag¬ 
ing  i.i.d.  replicates  of  /(X)  —  YA=i{E(f  (X.)\Ti)  —  a)  has  the  same  asymptotic 
variance  as  Y^hs-  Moreover,  given  a  zero-mean  control  variate  h(X),  h  a  de¬ 
terministic  function,  obtain  n  Latin  hypercube  samples  of  X  using  (48),  and 
form  the  combined  LHS+CV  estimator 

1  n 

YLHs+cv( A)  =  -  E  (/(Xfc)  -  A h(Xfc))  • 

Then,  n(Var  Yj^s+cviY)  —  Va tYlhs)  —>  0  for  any  control  of  the  type  h(X)  = 
J2i= i  hi(Xi)  because  h  E  M  and  property  a)  of  the  projection  operator  together 
imply  (/  —  Pm)  (f  —  A h)  =  (/  —  Pm).!'  for  all  A  G  M.  Using  Equations  (4)  and 
(49): 

Var  Vuis  i r\-( A'  j  =  VaiYLHS(l  -  p2), 
where  p 2  is  the  square  of  the  correlation  coefficient  between 

/(X)  -fi(/(X) \FZ)  and  fe(X)  -  £ E(h(X)\F,). 

1=1  1=1 

In  other  words,  a  good  CV  for  /(X)  in  the  LHS  context  is  one  that  maximizes 
the  absolute  value  of  the  correlation  coefficient  of  its  non-additive  part  with 
the  non-additive  part  of  /(X).  Notice  that  A*  is  the  optimal  CV  coefficient 
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associated  with  the  response  /(X)  —  E(/(X) |A))  and  the  CV  h(X)  — 

J2i=i  S(/i(X)|^rj);  refer  to  Owen  (1992)  for  the  estimation  of  A*  from  sample 
data. 

Additional  guidance  for  effective  CVs  in  the  LHS  setting  is  provided  by  (53): 
Choose  a  CV  with  non-additive  part  that  is  highly  correlated  with  fu's,  u \  >  1, 
that  have  large. 

As  regards  weighted  Monte  Carlo,  consider  using  LHS  to  generate  replicates 
Xi, . . . ,  Xn,  and  the  optimization  problem 

n 

minimize  ^  w\  n 
k=  1 


subject  to  E  wk,n  =  1  (54) 

k= 1 

n  1 

E  wk,nl {xi,k  e  Aij)  =  -,  for  i  =  1, . . . ,  d  and  j  =  1, . . . ,  n. 
k= i  n 

Clearly  w%n  =  1/n,  k  —  1, . . . ,  n,  is  feasible  for  (54),  and  it  is  also  optimal  by 
the  developments  of  Section  7,  so  that  the  WMC  estimator  J2k=iwtnf0^k) 
coincides  with  Ylhs  f°r  every  n  >  1.  For  d  —  1,  problem  (54)  furnishes  a 
WMC  estimator  that  equals  YrsT]  in  other  words,  for  d  =  1  LHS  yields  the 
same  variance  reduction  as  rST  in  the  limit  as  n  — >  oo. 


10  A  Numerical  Example 


In  this  section  we  present  a  numerical  example  that  supports  many  of  the  re¬ 
sults  discussed  in  the  chapter.  Consider  the  stochastic  activity  network  (SAN) 
of  Loh  (1995)  depicted  in  Figure  3  (see  also  the  SAN  discussion  in  Chapter 
1)  with  arcs  Xi,  Xi,  A3,  X4,  X$  that  are  independent  random  variables  that 
represent  activity  durations.  The  problem  of  numerically  computing  the  ex¬ 
pected  duration  of  the  shortest  path  that  leads  from  the  source  node  a  to  the 
sink  node  z  involves  estimating  a  =  EY,  where  Y  =  min{Ab  +  X2,  X{  +  X3  + 
X5,X4  +  X5}.  For  the  purposes  of  this  example,  we  assume  that  the  Xj’s  are 
exponentially  distributed  with  parameters  /ij  =  1.1,  /i2  =  2.7,  n3  =  1.1,  ^4  = 
2.5.//;,  1.2. 

Given  an  inner  sample  size  n,  we  wish  to  appraise  the  variability  of: 

•  The  crude  Monte  Carlo  estimator  Y. 

•  The  control  variates  estimator  Vc,y(An),  using  the  first  moments  of  the  AYs 
as  control  variates. 


•  The  conditional  Monte  Carlo  estimator  YCmc ■  Because  the  durations  of  the 
three  paths  that  lead  from  a  to  z  are  conditionally  independent  given  X\ 
and  X5,  E(Y\Xi,X$)  can  be  found  analytically;  see  Loh  (1995,  p.  103). 

•  The  weighted  Monte  Carlo  estimator  Ywmc ,  with  f(w)  =  —  login  in  Equa¬ 
tion  (38). 

•  The  stratification  estimator  YrgT,  where  we  stratify  on  X\. 

•  The  Latin  hypercube  estimator  Y'lhs  applied  to  X±, . . . ,  X5. 

In  order  to  compare  the  variance  of  these  estimators,  we  repeat  m  =  1000 
times  the  simulation  to  obtain  Y(m), . . . ,  YLHs(m )  by  averaging  Y, . . . ,  YLH$ 
over  m  .  The  (sample)  standard  deviations  of  these  six  estimators  are  s(m,  n) , 
scv(m,n),  sCMc(rn,n),  sWMc(rn,n),  srST(rn,n),  and  sLHs(rn,n). 

The  results  are  summarized  in  Table  1.  As  expected,  scv(m,  n)  ~  %mc(^i  n), 
and  SLHs(m , n)  <  SrST^m,  n)  for  each  n.  Notice  that  sr5r("h  n)  and  SLHs(m , n) 
behave  like  a  constant  divided  by  n1^2  for  each  n,  which  indicates  that  rST 
and  LHS  achieve  their  variance  reduction  potential  by  n  =  100. 


11  Conclusions 


We  presented  various  variance  reduction  techniques  for  terminating  simula¬ 
tions  in  the  Hilbert  space  setting,  establishing  connections  with  CV,  CMC, 
and  WMC  whenever  possible.  It  is  the  geometric  interpretation  of  Result  2 
that  makes  this  approach  especially  tractable. 

There  are,  however,  several  topics  missing  from  our  coverage  where  Hilbert 
space  theory  might  yield  valuable  insights.  Consider  for  instance  the  case  of 
variance  reduction  techniques  for  steady-state  simulations  that  have  a  suitable 
martingale  representation;  see,  for  example,  Henderson  and  Glynn  (2002).  It  is 


Fig.  3.  Stochastic  Activity  Network 
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Parameter 

Sample  size 

n 

100 

1000 

10000 

s(m,  n) 

0.0529 

0.0162 

0.0053 

scv(rn,,n ) 

0.0392 

0.0112 

0.0036 

scMc(rn,n ) 

0.0328 

0.0102 

0.0033 

swMc(m,n) 

0.0395 

0.0116 

0.0038 

srsT{m,n) 

0.0456 

0.0144 

0.0044 

SLHs(rn,,n) 

0.03 

0.0093 

0.0030 

Table  1.  SAN  Numerical  Example 

well-known  that  square  integrable  martingale  differences  have  a  simple  inter¬ 
pretation  in  the  Hilbert  space  framework,  which  suggests  that  it  might  be  pos¬ 
sible  to  obtain  additional  insights  when  dealing  with  such  techniques.  Another 
area  of  interest  is  the  Hilbert  space  formulation  of  CVs  in  the  multi-response 
setting,  where  Y  is  a  random  vector;  see  Rubinstein  and  Marcus  (1985)  for 
relevant  results.  The  combination  of  importance  sampling  (cf.  Chapter  12) 
with  CVs  also  can  be  studied  in  the  Hilbert  space  setting;  see,  for  example, 
Hesterberg  (1995),  and  Owen  and  Zhou  (1999). 
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